Memory module and system supporting parallel and serial access modes

ABSTRACT

A memory module can be programmed to deliver relatively wide, low-latency data in a first access mode, or to sacrifice some latency in return for a narrower data width, a narrower command width, or both, in a second access mode. The narrow, higher-latency mode requires fewer connections and traces. A controller can therefore support more modules, and thus increased system capacity. Programmable modules thus allow computer manufacturers to strike a desired balance between memory latency, capacity, and cost.

TECHNICAL FIELD

The subject matter presented herein relates generally to computermemory.

BACKGROUND

Computers include at least one central processing unit (CPU) thatfollows instructions and manipulates data. These instructions and dataare stored in various types of memory. Managing the flow of informationbetween the CPU and the memory requires considerable processing, whichwould interfere with CPU operation. A memory controller is thereforeprovided to manage the flow of information between the CPU and thememory. The memory controller can be integrated with the CPU, or can bea separate integrated circuit.

Ideally, a CPU in operation is never starved of instructions or data. Ifthe communication between the CPU and the memory controller is too slow,the CPU can waste valuable time awaiting information. The speed of amemory system has two essential characteristics, latency and bandwidth.“Latency” refers to the delay between a memory request and informationdelivery, whereas “bandwidth” refers to the amount of information thatcan be delivered by the memory per unit time.

Processing speed can be heavily dependent upon memory latency. Memorydevices and systems have therefore been designed to minimize latency.However, latency is but one variable in system performance, and the costof achieving low latency can be prohibitive. Processes that makerelatively few requests for larger amounts of information can be, forexample, more impacted by memory bandwidth than latency. Computer usersmay tolerate—or even fail to notice—longer latencies where bandwidth orcapacity are more significant factors. Computer systems with longermemory latencies would therefore be in demand if the relatively minorloss of performance was accompanied by significant cost savings,countervailing capacity improvements, or both.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 depicts a memory module 100 that can be programmed to deliverrelatively wide, low-latency data in a first access mode, or tosacrifice some latency in return for a narrower data width, a narrowercommand width, or both, in a second access mode.

FIG. 2 depicts module 100 of FIG. 1 connected to a memory controller 200via a memory channel 205.

FIG. 3 depicts a memory system 300 in which two modules 100, of the typedetailed above in connection with FIGS. 1 and 2, are coupled to memorycontroller 200 via a common, multi-drop memory bus 305.

FIG. 4 depicts a memory system 400 in accordance with an embodiment inwhich memory module 100, introduced in FIG. 1, is programmed to operatein a mode that communicates data and command signals with a memorycontroller 405 over a serial bus 410 that is much narrower than bus 205of FIG. 2.

FIG. 5 provides an example of one of configurable DQ buffers 125 of FIG.1.

FIG. 6 is a waveform diagram 600 illustrating how module 100 performs aread operation in the narrow configuration of FIG. 4.

FIG. 7 depicts a memory system 700 in accordance with an embodiment inwhich two (or more) modules 705 are coupled to a memory controller 710via a bus 713 in a daisy-chain configuration.

FIG. 8 depicts a memory system 800 in accordance with anotherembodiment.

FIG. 9 provides an example of one of configurable DQ buffers 830 of FIG.8.

FIG. 10 depicts a memory system 1000 in accordance with an embodiment inwhich two (or more) modules 1005 are coupled to a memory controller 1010in a daisy-chain configuration.

FIG. 11 depicts a memory system 1100 in accordance with an embodiment inwhich each of 8 memory modules 1105, connected to a memory controller1110 via a channel 1115, includes both volatile memory devices 1120 andnon-volatile memory devices 1125.

The figures are illustrations by way of example, and not by way oflimitation. Like reference numerals in the figures refer to similarelements.

DETAILED DESCRIPTION

FIG. 1 depicts a memory module 100 that can be programmed to deliverrelatively wide, low-latency data in a first access mode, or tosacrifice some latency in return for a narrower data width, a narrowercommand width, or both, in a second access mode. The narrow, higherlatency mode requires fewer connections and traces. A single pad-limitedcontroller, illustrated in connection with later figures, can thereforesupport more modules, and thus increased system capacity. A single typeof programmable module thus allows computer manufacturers to strike adesired balance between memory latency, capacity, and cost.

Module 100 includes a number of IC memory devices 105 divided into tworanks R0 and R1. In this example, each rank includes nine 8-bit-wideintegrated-circuit (IC) memory devices 105, and each device 105 stores 4Gbit (512 MByte) of information. One of the nine memory devices 105 ineach rank are for error detection and correction (EDC), leaving 8devices for data. Module 100 thus effectively stores 2×8×512 MB=8 GB.

Module 100 also includes a buffer system 110 that manages communicationbetween memory devices 105 and a memory controller (not shown) via amodule connector 115. Buffer system 110 includes at least onememory-buffer IC, in this example a command/address (CA) buffer 120(also commonly known as a “CA register” chip) and nine data (DQ) buffers125. Each DQ buffer 125 is an 8-bit bidirectional data buffer thatsupports a differential strobe (DQS) signal (not shown). A 8-bit busconnects each buffer 125 to module connector 115, and each of theinternal 8-bit DRAM data interfaces DQi connects to one of the x8 DRAMdevices 105 in one of ranks R0 and R1. Note that in some embodiments,the DRAM devices are 4-bits wide rather than 8-bit (i.e., each rank onthe 72-bit module would consist of eighteen x4 DRAMs instead of nine x8DRAMs as shown in FIG. 1), and in this embodiment each of the nine DQbuffers 125 would have dual x4 interfaces rather than a single x8interface. Each buffer 125 additionally supports a configurableinterface (FIG. 5) that is connected to CA buffer 120 via either a leftprivate bus PriL or a right private bus PriR. These private bussesconvey control signals, a voltage reference, and a differential clocksignal to buffers 125. Private busses of this type are sometimesreferred to as “BCOM” busses. Also in some embodiments, the PriL andPriR busses are combined into a single “T-shaped” bus which originatesat the CA buffer 120 and then splits left and right. In addition, and insupport of the narrow mode, CA buffer 120 and DQ buffers 125 can beconfigured to convey bi-directional data between CA buffer 120 and DQbuffers 125, via this private bus interface.

Module connector 115 includes nine groups of eight parallel DQinput/output (I/O) pins, for a total of 72 DQ signals. (As used herein,the term “data” refers to the information conveyed over these modulepins, and is not necessarily descriptive of the type of information soconveyed.) Though not shown, each group of DQ I/O pins is accompanied bya pair of complementary data strobe (DQS/DQS#) signals that conveytiming information for the group. Additionally, connector 115 alsoextends to a CA primary interface 130 on CA buffer 120. Via thisinterface, the CA buffer receives a true and complement clock pairCK/CK# and command and address signals CAext[31:0]. This CA primaryinterface 130 includes a configurable command port 133 that can beprogrammed, depending upon the contents of register 145, to supportdifferent signal widths and communication protocols.

CA buffer 120 has the capability of configuring the DQ buffers 125 toprovide a width-configurable data port. In a wide mode, each DQ buffer125 sends and receives data on all eight DQ signal paths. In a narrowmode, each DQ buffer 125 sends and receives relatively narrow serialdata, either via a subset of the DQ signal paths or one of privatebusses PriL and PriR. In the latter case, the external data interfaceused in the wide mode has an effective data width of zero, with dataflowing to and from module 100 via CA buffer 120.

The internal and external data signals DQi and DQ are single-ended andthe strobe and clock signals DQS/DQS# and CK/CK# are complementary inthis example, but other embodiments will use different types orcombinations of types of signaling schemes for the various internal andexternal connections. In this context, “internal” signal lines are thosethat communicate wholly within and between devices on module 100, and“external” lines convey signals to and from module 100. All of the data,strobe, command, address, and clock signals in this wide mode can beelectrically and logically compatible with known DRAM protocols, whichare well understood by those of skill in the art.

CA buffer 120 includes control and data logic 135 to manage the flow ofinformation on module 100, a secondary physical interface 140 thatserves as an internal command interface to communicate command andaddress signals via a secondary command bus CA2nd to DRAM devices 105,and a mode register 145 to store and present a mode signal M to logic135. CA buffer 120, responsive to a configuration command from thememory controller, loads mode register 145 with a value indicative ofeither the wide, low-latency mode or the relatively narrower andhigher-latency mode.

FIG. 2 depicts module 100 of FIG. 1 connected to a memory controller 200via a memory channel 205. In this example, module 100 is configured tooperate in a first mode, a wide, low-latency, parallel-access mode.Module 100 includes a “DDR4 LRDIMM chipset” in which memory devices 105interface with buffer system 110 provides complete buffering of command,address, clock, and data signals. DQ buffers 125 are disposed across thebottom of module 100 to minimize stub lengths and concomitant skewbetween data bits. Memory controller 200 can be part of a centralprocessing unit, or can be a separate integrated circuit.

The operation of module 100 is consistent with that of LRDIMM servercomponents that employ DDR4 memory. Those of skill in the art arefamiliar with such operation, so a detailed treatment is omitted.Briefly, CA buffer 120 registers and re-drives clock signal CK/CK# andcommand and address signals from controller 200 to address and controlDRAM devices 105, with primary interface 130 providing load isolationfor these signals. Address and control signals arrive via configurablecommand port 133, which in this mode is configured to support aparallel, 32-bit-wide command connection CAext[31:0]. Logic 135interprets these signals (e.g., in a manner consistent with the DDR4specification) and conveys them to devices 105 via secondary bus CA2nd.

DQ buffers 125 provide load isolation for read, write, and strobesignals to and from devices 105, and receive control signals via privatebusses PriL and PriR to e.g. prepare them for the direction of dataflow. The private busses also convey mode-selection information that canalter the way buffers 125 convey data. Some such configuration optionsare detailed below.

Including 72 data signals, true and complement strobes, plus the 32 bitsof CA information, memory channel 205 includes about one hundred fortyelectrical connections on a motherboard (not shown) that supports bothcontroller 200 and module 105. To interface to this memory channel,controller 200 includes about two hundred fifty input/output pins,including those for power and ground. Assuming controller 200 supportsfour independent memory channels 205, and each channel two modules 100,controller 200 requires about one thousand input/output pins toaccommodate eight memory modules.

FIG. 3 depicts a memory system 300 in which two modules 100, of the typedetailed above in connection with FIGS. 1 and 2, are coupled to memorycontroller 200 via a common, multi-drop memory bus 305. DQ buffers 125and other details are omitted for ease of illustration, and internal andexternal data lines DQi and DQ are conflated into paths marked DQ/i.Memory controller 200 uses address information to arbitrate between thetwo modules 100. Multi-drop buses of this type are well known, andfacilitate easy memory extensibility. However, each module createsimpedance discontinuities that tend to degrade signaling speed, and thusmemory latency and bandwidth. The buffering of data and control signalsameliorates this problem.

FIG. 4 depicts a memory system 400 in accordance with an embodiment inwhich memory module 100, introduced in FIG. 1, is programmed to operatein a mode that communicates data and command signals with a memorycontroller 405 over a serial bus 410 that is much narrower than bus 205of FIG. 2. Module 100 of FIG. 4 suffers longer latency than does theembodiment of FIG. 2, but advantageously requires far fewer pins andconnections. There is a practical limit to the number of I/O pins on agiven memory controller, so the reduced width of bus 410 allows for morememory modules, and consequently more storage, per memory controller.From a system perspective, the need for fewer or less complexcontrollers can significantly reduce the cost per unit of storage.Moreover, where the narrow mode is used to increase the number ofmodules in a given system, the resultant higher capacity canconsiderably curb accesses to secondary memory, and thus reduce accesslatency from a system perspective. For example, systems that incorporatenarrow modules of this type can be optimized for use with main-memorydatabases or memory appliances to provide relatively faster and morepredictable performance than similar systems that rely more heavily ondisk-based storage.

Memory controller 405 can be a standard microprocessor, a customintegrated circuit, but is in one embodiment a field-programmable gatearray (FPGA) programmed as needed to implement memory-controlfunctionality. Serial bus 410 includes differential clock lines CK/CK#,as in prior embodiments, but the external command and address lines areconfigured differently. In this example, configurable command port 133is narrowed to just eight lines, four for a pair of differential serialcommand/address signals CAx[1:0] and four for a pair of data linesDQx[1:0]. Paths CAx[1:0] are unidirectional, and can carry memorycommands, addresses, and write data WR to CA buffer 120. Paths DQx[1:0]are also unidirectional, and carry read data RD and status or controlinformation from CA buffer 120 to controller 405. DQ buffers 125 areconfigured to communicate read and write data via private busses PriLand PriR; the seventy-two single-ended signals that convey external DQsignals, and the nine associated pairs that convey DQS signals, are notused in this mode. The number of connections used to convey commands isalso reduced considerably in comparison with the low-latency mode ofFIGS. 2 and 3.

The functional changes required of module 100 to go from the wide modeof FIG. 2 to the narrow mode of FIG. 4 are initiated by loading moderegister 145 with a value indicative of the narrow mode. Register 145shares a mode signal M with control and data logic 135 and primaryinterface 130. Logic 135 conveys other mode information to each DQbuffer 125 to configure those buffers to communicate data via privatebusses PriL and PriR in lieu of the external DQ connections.

Memory controller 405 delivers write data, commands, and addresses tocommand port 133 via lines CAx[1:0], and receives read data via linesDQx[1:0]. The widths of these connections can be different in otherembodiments. In this embodiment, for example, two of the pins on the CAbuffer are modally repurposed to support the serialized CAx[1:0]interface, which carries command/control/address information from thememory controller. These commands are deserialized by configurablecommand port 133, and logic 135 directs the resultant commands to DRAMdevices 105 via secondary command interface CA2nd using normal DDR4protocols. When device 105 respond the commands (e.g., a readtransaction), their data transmissions are conveyed over internal databus DQi and captured by DQ buffers 125, which interface with CA buffer120 via private busses PriL and PriR. CA buffer 120 aggregates the readdata from buffers 125 and delivers the resultant read data to the memorycontroller via the DQx interface. In this example, the DQx interfaceincludes two modally redefined pins. Write transactions are handledsimilarly, in reverse.

Both paths CAx[1:0] and DQx[1:0] can be electrically and logicallyincompatible with the communication protocols required by DRAM devices105 because DQ buffers 125 facilitate communication with DRAM devices105. In this way, a DDR4 module which is fully compatible to theload-reduced DDR4 standard can support a mode that requires only asix-wire interface (two pins for a clock reference, two forcommand/address, and two for data) on the memory controller, rather thana one hundred forty-wire interface. For the same processor pin-budgetand cost-budget, therefore, more than 20× as many memory modules can beserviced. The tradeoff for this serialization mode is access latency. Inone embodiment, for example, module 100 provides about 50 ns of accesslatency in the wide mode of FIG. 2 and about 200 ns of access latency inthe narrow mode.

In addition to increasing latency, serializing the data can slow readand write speeds. However, controller 405 can simultaneously accessadditional modules 100 to make up for such loses in memory bandwidth. Inexamples in which the memory bandwidth of serial busses 410 areconsiderably lower than those of the parallel mode, the relative paucityof per-module data traffic in the narrow mode allows module 100additional time for e.g. error checking. Bus 410 can be implementedusing point-to-point connections for improved speed performance.

In some embodiments buffer system 110 can support one or more low-powermodes in which some number of modules, or DRAM devices on one or moremodules, are disabled. Even empty DRAM consumes power, largely torefresh the contents of empty storage locations. Controller 405 cancommunicate with buffer 120 on each module to manage power settings,including the number of active DRAM devices 105, DQ buffers 125, orboth. Unused modules 100 can thus be kept offline, or at least in alow-power mode, until their storage resources are needed. Controller 405can be given full control over memory mapping to facilitate this andother functionality.

FIG. 5 provides an example of one of configurable DQ buffers 125 ofFIG. 1. Buffer 125 includes a three interfaces 500, 505, and 510, acommand decoder 515, a register 520, and a multiplexer/demultiplexer525. Private bus PriR includes a command interface PriC to issueinstructions to buffer 125 and a bidirectional data interface PriDQ.

At start up, CA buffer 120 (FIG. 1) issues commands to decoder 515 toload register 520 with a value indicative of the desired operationalmode. In the wide mode, detailed above in connection with FIG. 2,register 520 controls multiplexer/demultiplexer 525 to interconnectinterfaces 505 and 510 so that external data signals DQ are communicatedas internal data signals DQi. Data interface PriDQ is not used. In thenarrow mode, detailed above in connection with FIG. 4, register 520controls multiplexer/demultiplexer 525 to route internal data signalsDQi via interface PriDQ. External data connection DQ is not used in thisexample.

FIG. 6 is a waveform diagram 600 illustrating how module 100 performs aread operation in the narrow configuration of FIG. 4. First, module 100receives an exemplary 64-bit read command RD on interface CAx. Primaryinterface 130 converts this serial command to parallel, and issues thecommand to control and data logic 135, which issues successive activate(A) and read (R) commands to devices 105 on secondary command bus CA2nd.Devices 105 respond by delivering a 288-bit read burst Q across the 72data lines of internal data bus DQi (assuming each device 105 has aburst-length of 4). Each of the nine DQ buffers 125 convey theirrespective 32-bits of read data over one of private busses PriL andPriR. In this embodiment, each 32-bit parcel is delivered as aneight-bit burst over four data lines of the respective private bus.Control and data logic 135 aggregates the nine 32-bit data parcelsbefore delivering 288 bits of read data and an optional header (e.g., aheader might be prepended that includes the 64-bit address associatedwith the data burst, to simplify transactional ordering within thememory controller; alternatively or additionally, CRC information aboutthe address and/or data burst could be included in a header, to helpfortify reliability of the data delivery; other status-related signalscould also be included within predefined header fields) to primaryinterface 130, which serializes the header and data before transmittingthem to the memory controller.

Where the read data includes error-correction bits, control and datalogic 135 can perform error correction on the module, or can simplydeliver the error-correction and data bits to the memory controller.Likewise, error-correction bits can be derived either on or off module100. If error detection and correction is performed on the module, asummary of those operations could be included within the optional headerfields.

FIG. 7 depicts a memory system 700 in accordance with an embodiment inwhich two (or more) modules 705 are coupled to a memory controller 710via a bus 713 in a daisy-chain configuration. In this context, a “daisychain” is a wiring scheme in which multiple modules are wired togetherin sequence. In this configuration, controller 710 interfaces with thefirst module, which in turn interfaces to the second, which in turninterfaces to the third, etc., all using a relatively narrow (e.g.,12-wire) interface for commands, addresses, timing references, and data.As compared with the embodiment of FIG. 4, this connectivity reducescontroller pin requirements even further.

Each module 705 includes a CA buffer 715 and nine memories 720. CAbuffer 715 is like CA buffer 120 of FIG. 1, but additionally includessupport for daisy-chain connectivity to CA buffers on the next module.DQ buffers and various command and data paths are included in modules705, but are omitted from this illustration for simplicity. Memorycontroller 710 can schedule many commands to the chain of two or moremodules 705 so that the modules 705 can work simultaneously. If desired,the relatively narrow bus width allows controller 710 to have arelatively low pin count or support additional channels and modules. Forexample, one twelve-wire channel 713 can serve eight or more modules705.

With reference to the memory channel 713, lines CFM/CFM# convey aunidirectional clock reference that starts at controller 710 (“Clockfrom master”), flies-by all of modules 705, and turns around at the farend of the channel to become signal CTM/CTM# (“Clock to master”).Signals CA/CA# are source-synchronous, differential command andaddresses that are connected point-to-point and repeated to downstreammodules. Data signals DQ/DQ# are similarly differential and conveyed andrepeated using point-to-point connections.

System 700 simplifies read/write transactions: transmissions from thememory controller to the modules (e.g., commands and write data) areedge-aligned to signal CFM/CFM# on the like-named signal path, whiletransmissions from the modules to the controller (e.g., read data) areedge-aligned to signal CTM/CTM#. In this topology, transactions (e.g.,read, writes, refresh) can be ongoing in each module independently, andtransaction data can be collected in round-robin fashion.

FIG. 8 depicts a memory system 800 in accordance with anotherembodiment. As in the example of FIGS. 1-7, memory system 800 includes amodule 805 that supports a wide, low-latency mode and a relativelynarrow, higher-latency mode. Rather than route the data throughrepurposed command lines, however, the embodiment of FIG. 8 employs anexternal memory bus 807 with a small subset of data lines DQ (e.g., 1serial connection instead of eight parallel connections for each DQbuffer) in the narrow mode.

System 800 includes module 805 and a memory controller 810. In the wide,low-latency mode, module 805 is configured to operate as describedpreviously in connection with FIG. 2. The following discussion detailshow module 805 is configured to operate in a narrow mode.

Memory controller 810 can be a standard microprocessor, a customintegrated circuit, but is in one embodiment an FPGA programmed asneeded to implement memory-control functionality. Signal pair CK/CK# isa unidirectional clock reference, and signal pair CA/CA# represents aunidirectional, narrow (2-8 wires) command and address bus. Each of ninedata connections DQ0/# to DQ8/# is bidirectional, narrow (e.g. 2 wires)and carries both read and write data to and from controller 810.

Module 805 includes components in common with module 100 of FIG. 1, withlike-labeled elements being the same or similar. Module 805 additionallyincludes a system of buffers 820 that manage communication betweenmemory devices 105 and memory controller 810 via a module connector 815that runs along the bottom of module 805 in this illustration.

The system of buffers 820 includes a CA buffer 825 and nine DQ buffers830. Each DQ buffer is an 8-bit bidirectional data buffer that supportscomplementary strobe (DQS/DQS#) signals. A 8-bit external bus DQconnects each buffer 830 to the module connector, and the internal 8-bitDRAM interfaces connects to x8 DRAM devices 105 via internal bus DQi.Each buffer 830 additionally supports a configurable DQ interface thatcan deliver x8 or x1 data, in dependence upon a mode-selection valuedelivered over either a left private bus PriL or a right private busPriR.

In the narrow, relatively high-latency mode, control and DQ logic 840within CA buffer 825, responsive to a mode value loaded in mode register105, configured primary interface 845 to reduce the command and addressbus width to a single point-to-point serial interface CA/CA#ext in thisexample. Logic 840 also conveys the mode value to DQ buffers 830 viatheir respective private busses PriL and PriR to configure each ofbuffers 830 to support a bidirectional, point-to-point serial interfacefor data, reducing the requisite number of data pins from eight to two,for example. The remaining DQ pins on module 805 are not used in thismode. Module 805 can additionally support the mode of FIG. 4 in otherembodiments.

FIG. 9 provides an example of one of configurable DQ buffers 830 of FIG.8. Buffer 830 includes four interfaces 900, 905, 910, and 915; a commanddecoder 920; and a register 925. At start up, CA buffer 825 (FIG. 8)issues commands to decoder 920 to load register 925 with a valueindicative of the desired operational mode. In the wide mode, similar tothat detailed above in connection with FIG. 2, register 925 enablesinterface 910 and disables interface 915 so that eight external datasignals DQ are communicated as eight internal data signals DQi. In thenarrow mode, register 925 controls enables serial interface 915 anddisables interface 910. Interface 915 serializes read data anddeserializes write data, communicating data signals over a pair of linesas differential serial signals.

FIG. 10 depicts a memory system 1000 in accordance with an embodiment inwhich two (or more) modules 1005 are coupled to a memory controller 1010in a daisy-chain configuration. Controller 1010 interfaces with thefirst module, which in turn interfaces to the second, which in turninterfaces to the third, etc., all using relatively narrow interfacesfor commands, addresses, timing references, and data.

Each module 100 includes a CA buffer 1015 and nine DQ buffers 1020. CAbuffer 1015 is like CA buffer 120 of FIG. 1, but additionally includessupport for daisy-chain connectivity to buffer signals to the nextmodule. Memory devices and various command and data paths are includedin modules 1005, but are omitted from this illustration for simplicity.

With reference to the memory channel 1025, lines CFM/CFM# (“clock frommaster”) convey a unidirectional clock reference that starts atcontroller 1010, flies-by all of modules 1005, and turns around at thefar end of the channel to become signal CTM/CTM# (“clock to master”).Signals CA/CA# are source-synchronous, differential commands andaddresses that are connected point-to-point and repeated to downstreammodules. Data signals DQ0/# through DQ8/# are similarly differential andconveyed and repeated using point-to-point connections.

FIG. 11 depicts a memory system 1100 in accordance with an embodiment inwhich each of 8 memory modules 1105, connected to a memory controller1110 via a channel 1115, includes both volatile memory devices 1120 andnon-volatile memory devices 1125. Each module 1105 is a Non-VolatileDual In-line Memory Module (NVDIMM) that that retains data absentelectrical power. Such persistent storage can be important for a numberof reasons. For example, sophisticated processes may requireconsiderable computing time and resources to generate data, which wouldhave to be reacquired following an intentional or unintentional systemshutdown or failure. Even if the lost data were stored elsewhere, suchas on a hard disk or in cloud storage, the time required to repopulatemodules 1105 might be excessive. Assuming, for example, that each of theeight modules 1105 stores 16 GB, simply reloading the overall storagespace of up to 128 GB of storage space could take an impractical orundesirable time period. Each buffer 1130 can therefore store importantinformation in respective NVM device 1125. In an embodiment in whicheach module 1105 includes 16 GB of DRAM, a NVM 1125 of 24 GB can storeall the data in volatile memory and use the remaining 8 GB for e.g. wearleveling.

Each CA buffer 1130 is connected to respective NVM device 1125 via ahigh-speed connection, a serial connection such as PCI Express (PCIe) inone example. Controller 1110, at the direction of an underlying process,can raise a persistence flag that causes buffer 1130 to store associatedinformation in both volatile DRAM device 1120 and NVM 1125. In examplesin which the memory bandwidth of serial busses 410 are considerablylower than those of the parallel mode, the relative paucity ofper-module data traffic in the narrow mode allows module 100 additionaltime for interacting with NVM 1125.

As discussed above in connection with FIG. 4, some embodiments can placeunused modules in a low-power state when not in use. Modules 1105 cansupport the same or similar low-power modes, but can maintain stateabsent power. Modules 1105 can also support a mid-power mode in whichDRAM devices 1120 are powered off without depriving access to NVM 1125.System 1100 can thus provide low-power read access for data that israrely written. Buffer 1130 can move data from NVM 1125 to DRAM device1120 when performance requirements change.

An output of a process for designing an integrated circuit, or a portionof an integrated circuit, comprising one or more of the circuitsdescribed herein may be a computer-readable medium such as, for example,a magnetic tape or an optical or magnetic disk. The computer-readablemedium may be encoded with data structures or other informationdescribing circuitry that may be physically instantiated as anintegrated circuit or portion of an integrated circuit. Although variousformats may be used for such encoding, these data structures arecommonly written in Caltech Intermediate Format (CIF), Calma GDS IIStream Format (GDSII), or Electronic Design Interchange Format (EDIF).Those of skill in the art of integrated circuit design can develop suchdata structures from schematic diagrams of the type detailed above andthe corresponding descriptions and encode the data structures oncomputer readable medium. Those of skill in the art of integratedcircuit fabrication can use such encoded data to fabricate integratedcircuits comprising one or more of the circuits described herein.

While the present invention has been described in connection withspecific embodiments, variations of these embodiments will be obvious tothose of ordinary skill in the art. Moreover, some components are showndirectly connected to one another while others are shown connected viaintermediate components. In each instance the method of interconnection,or “coupling,” establishes some desired electrical communication betweentwo or more circuit nodes, or terminals. Such coupling may often beaccomplished using a number of circuit configurations, as will beunderstood by those of skill in the art. Therefore, the spirit and scopeof the appended claims should not be limited to the foregoingdescription. Only those claims specifically reciting “means for” or“step for” should be construed in the manner required under the sixthparagraph of 35 U.S.C. §112.

What is claimed is:
 1. A memory module supporting a wide-data mode and anarrow-data mode, the memory module comprising: a module connector;integrated-circuit (IC) memory devices; and at least one memory-bufferIC having: an internal data interface coupled to the memory ICs; aninternal command interface coupled to the memory ICs; an external datainterface coupled to the module connector, the external data interfaceincluding a configurable data port supporting: a first data connectionof a first data width to convey data in the wide-data mode; and a seconddata connection of a second data width, less than the first data width,in the narrow-data mode; and an external command interface coupled tothe module connector, the external command interface including aconfigurable command port supporting: a first command connection of afirst command width to convey memory commands in the wide-data mode; anda second command connection of a second command width, less than thefirst command width, to convey the memory commands in the narrow-datamode.
 2. The memory module of claim 1, wherein the second commandconnection includes a third data connection to convey data in thenarrow-data mode.
 3. The memory module of claim 2, wherein the datacomprises write data.
 4. The memory module of claim 2, wherein thesecond data width is zero.
 5. The memory module of claim 1, wherein thesecond data connection conveys data in the narrow-data mode.
 6. Thememory module of claim 1, the at least one memory-buffer IC furtherhaving a mode-memory register to receive a mode-memory signal indicativeof one of the wide-data mode or the narrow-data mode.
 7. The memorymodule of claim 1, wherein the command port conveys single-endedcommands in the wide-data mode and differential commands in thenarrow-data mode.
 8. The memory module of claim 7, wherein the commandport conveys differential data in the narrow-data mode.
 9. The memorymodule of claim 8, wherein the command port conveys no data in thewide-data mode.
 10. The memory module of claim 1, wherein the at leastone buffer includes a command buffer coupled between the internalcommand interface and the external command interface, and data buffersconnected between the internal data interface and the external datainterface.
 11. The memory module of claim 1, wherein the configurablecommand port includes connections that can be configured to communicatesingle-ended or differential signals.
 12. The memory module of claim 1,further comprising a second module connector, the at least onememory-buffer IC including a second external command interface coupledto the second module connector to forward external commands to a secondmodule.
 13. The memory module of claim 1, wherein the IC memory devicescomprise volatile memory for storing the data in the narrow-data mode,the memory module further comprising non-volatile memory coupled to theat least one memory-buffer IC to store the data.
 14. The memory moduleof claim 13, the non-volatile memory storing the data in the wide-datamode.
 15. A memory system comprising: a command buffer to facilitatecommunication between a memory controller and dynamic random-accessmemory (DRAM), the DRAM to store write data and provide read data, thecommand buffer including: a configurable command port for receivingfirst memory-controller commands in a parallel-access mode, and forreceiving second memory-controller commands and at least one of the readdata and the write data in a serial-access mode.
 16. The memory systemof claim 15, further comprising: a memory bus coupled to thefirst-mentioned command buffer and the memory controller; and a secondcommand buffer to facilitate communication between the memory controllerand second DRAM; wherein the second command buffer is connected to thememory controller via the first-mentioned buffer.
 17. The memory systemof claim 16, wherein the second command buffer is connected to thememory controller via the first-mentioned command buffer in adaisy-chain configuration.
 18. The memory system of claim 15, furthercomprising a data buffer to convey the write data and the read data toand from the DRAM in the parallel-access mode.
 19. The memory system ofclaim 18, the command buffer to direct the write data and the read datathrough the data buffer in the parallel-access mode.
 20. The memorysystem of claim 15, further comprising non-volatile memory coupled tothe command buffer to store the write data.